Modeling correlated human dynamics 
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We empirically study the activity patterns of individual blog-posting and find significant memory 
effects. The memory coefficient [K.-f. Goh and A.-L. Barabasi, EPL 81, 48002 (2008)] first decays 
in a power law and then turns to an exponential form. Moreover, the inter-event time distribution 
displays a heavy-tailed nature with power-law exponent dependent on the activity. Our findings 
challenge the priority-queue model [A.-L. Barabasi, Nature 435, 207 (2005)] that can not reproduce 
the memory effects or the activity-dependent distributions. We think there is another kind of 
human activity patterns driven by personal interests and characterized by strong memory effects. 
Accordingly, we propose a simple model based on temporal preference, which can well reproduce 
both the heavy-tailed nature and the strong memory effects. This work helps in understanding both 
the temporal regularities and the predictability of human behaviors. 

PACS numbers: 89. 75. Da, 02.50.-r 



I. INTRODUCTION 

Human actions undcrly many social, technological and 
economic phenomena, and thus the quantitative under- 
standing of human behavior is very significant [l|, |2(. 
Thanks to the development of the information tech- 
niques, more and more electronic records available from 
Internet may provide us insights into the patterns of hu- 
man behavior [3j, \^. In recent years, examples of empir- 
ically studied human activities include communication 
patterns of electronic mails [5|-[7[ and surface mails 0- 
9|, web surfing [HI, fll|. short message Qjj], movie rat- 
ings [13j . online game [lj, [T[|. The main result, arising 
from all these studies, concerns the heavy-tailed natures 
of human activity: the inter-event and/or response times 
follow a power-law-like distribution at the level of both 
population and individual. 

A possible explanation of the heavy-tailed nature is 
the priority-queue model firstly introduced by Barabasi 
[5j, Il6| ] , in which human behavior is primarily driven by 
rational decision making. Another possible origin is the 
cascading nonhomogencous poisson process which em- 
phasizes the external factors such as circadian and weekly 
cycles [a, 0, [13 ■ Although both models can give rise to 
heavy-tailed distribution of inter-event and/or response 
times, the internal correlation of the activities of human, 
as the most complex creature on earth, is absent. How- 
ever, in the common sense, our activities should display 
memory effects since the personal tastes and interests 
are known to have both the long-term consistence and 
short-term burstiness. Long-term temporal memory in 
some human-initiated systems has already been observed 
[LSI [191 ]. Moreover, the significant potential predictability 
found in human mobility can be considered as a comple- 
mentary evidence of spatial memory of human activities 
[2CJ | . Actually, our daily activities can be roughly di- 



vided into two classes: things we have to do and things 
we want to do. Sending emails, making calls, submitting 
programmes in Linux servers, printing papers can be seen 
as the first class, which are important yet may not be in- 
terested to us. In contrast, entertainment activities, such 
as listening to music, watching movies and reading books, 
are driven by personal interests and thus belong to the 
second class. Models considering adaptive interests can 
to some extent reproduce the memory effects [2lT - [23j . Be- 
sides the memory effects, by extensive empirical analyses 
on more than 10 real systems [12|, uM, [2J, [25| , it is shown 
that the individual activity (i.e., the frequency of actions 
of an individual) plays an important role in determin- 
ing the distribution of inter-event time distribution: the 
larger the activity the narrower the distribution. Both 
the memory effects and the activity-dependent distribu- 
tions can not be reproduced by the priority-queue model 
with two universality classes [la ]. 

In this paper, we empirically study the activity pat- 
terns of individual blog-posting and find significant mem- 
ory effects. Moreover, the inter-event time distribution 
displays a heavy-tailed nature with power-law exponent 
dependent on the activity. Wc propose a simple model 
based on temporal preference, which can well reproduce 
both the heavy-tailed nature and the strong memory ef- 
fects. This paper is organized as follows. In the next sec- 
tion, we introduce the empirical observations, followed by 
the model and simulation results in Section 3. We con- 
clude this work in Section 4 with some discussion about 
the relevance of our work to the real human behavior. 



II. EMPIRICAL ANALYSIS 

Blog is a kind of so-called web2.0 application emerg- 
ing in recent years, in which people post some words, 
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the number of posts 

FIG. 1: (Color online)The distribution of the number of posts. 
There are two scaling regimes, the exponents of which are - 
1.48, -0.87. The part in the shadow correspond to the users 
whose number of posts is greater than 200. 



read and comment it each other. For most ordi- 
nary bloggers(the user of blog), they post in their 
own interest and treat it only as an amusement or 
an optional way of communication with friends. This 
make their blogging unstable and the frequency of it 
could be relatively low(often once one day). Our data 
was collected from a campus blog website of Nanjing 
university(http://bbs. nju.edu.cn/blogall). Most users 
are current or former students and teachers of Nanjing 
university. As of 01/09/2009, there are 1627697 posts 
belonged to 20379 users in this website. The first post is 
at 25/03/2003 when this blog established. In fig 1, the 
distribution of the number of the post decays as so-called 
double power law. The same result was also reported by 
Grabowski|26| who though that there are two groups of 
people which clause two scaling regimes. 

The heavy-tailed nature of the global distribution of 
interevent times of all users is shown on the fig 2a. the 
exponent of this distribution is -1.98 which is very close 
to the one in movie rating [lj] and web activities on AOL 
and Ebay 24| . Figure 2b is the global interevent times 
distribution of the users whose number of posts is more 
than 600. One peculiarity of this distribution is that it 
has overmany long intervals as we can see in the part of 
interevent time r > 200. The similar feature also can be 
found in the distribution of single user (seeing the insert 
of fig 4). But it is absent in fig 2a that shows only mature 
users whose number of posts is larger have this feature. 
We also can see the exponent of it is more than the one in 
fig 2a. That actually is conformance to the dependence 
between the exponent and Activity which we will study 
below, since the users who have more number of posts 
can often be more active. 

Following the way in [13| , We sort users in an ascending 
order of Activity Af. A. t — rii/di, where n, is the total 
number of posts of user i and di is the time between the 




FIG. 2: (Color online) (a) The distribution of interevent time 
of the whole population, n is the number of interevent times 
(t). We fit this distribution with the so called "shifted power- 
law": y ~ (x + a) -/5 [3(}. For (a), the exponent /3 = 1.98 and 
a = 2.1. (b)The global interevent time distribution of the 
uses whose number of posts is more than 600. the exponent 
/3 = 2.42 and a = 1.0. The average of Activity is 0.76 that 
would be why its exponent is larger than the one of fig a. 



first and the last post. For simplicity, we only divide 
these users into two groups: one is the top 14000 users in 
the list above and the average of Activity (A) = 0.04 per 
day, the other one is the remainder containing 6379 users 
and (A) =0.64 per day. As we can see in fig 3, the decay 
exponent of interevent times distribution in group level 
increases from 1.86 to 2.47 as the Activity from 0.04 to 
0.64. 

For individual behavior, we only consider users whose 
number of posts is more than 200 to avoid characterizing 
users who post too little. There are 2211 qualified users. 
Firstly, we choose one user for example. As shown in fig 
4, both the distribution and the cumulative one of in- 
terevent times decay asymptotically as a power law. The 
exponents of the cumulative distribution of this user is 
1.40. Correspondingly, the one of distribution of this user 
are 2.40. The average of the exponents of all qualified 
users is 2.23. The interevent times for consecutive events 
of this user is shown on fig 5a. As comparison, the fig 5b 
is another user's. It can give us the light of the human 
behavior from a visual understanding. One of important 
features is the clustering of extreme long interevent times 
which also is called mountain- valley-structure and can be 
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FIG. 3: (Color online)The distribution of interevent time of 
the two groups. The fitting result is as follow: for group 1, 
/3 = 1.86 and a = 5.5; for group 2, f} = 2.47 and a = 3.0. The 
exponent j3 increase from 1.86 to 2.47 as the Activity from 
0.04 to 0.64. 



10 3 


» 




- 






1_ 


• 


. 




E 

Z 10 2 
> 




*. p 


: 


(O 




3 

E 

Sio 1 




•\« 


- 


F 




• 

O 




1 ^0 









-(d) 



FIG. 4: (Color online) The cumulative distribution of in- 
terevent times of one user. The distribution of interevent 
times was shown in the inset of it. the exponents of the cu- 
mulative one of this user is 1.40. Correspondingly, the one of 
distribution of this user are 2.40. 



found in many complex systems (271 l28l|. The long-term 
change also can be found in these users: for fig 5b, the 
posting frequency is obviously lower in the span of event 
number 200-400. 

The features above inspire us to investigate the mem- 
ory coefficient of this succession, although the lack of it 
in human activities was already reported by [29J]. The 
definition of it is as follow 123: 



M k = 



n T —~\ 
l T - 1 ^ 



(r* - mi){n +k -m 2 ) 



(71(72 



(1) 



ber of interevent time and m\ (7712) and <7i(<7i) are sample 
mean and sample standard deviation of r,-'s (rj+fc's). The 
two interevent times n ' and n + k is separated by k events. 

Here, we calculate Mk of all these qualified users with 
k ranging from 1 to 40. Because the number of posts 
of one single user is still so small that the decay curve 
of Mk of one user presents too big fluctuation, we only 
study the average Mk of all users. In fig 5c, Mi is 0.21 
which shows there is strong memory between the nearest 
interevent times. Interestingly, there are two regimes in 
this decay curve: when k < 10, it decays asymptotically 
as a power law: y = 0.23 * x~ 0AS ; when k > 10, it de- 
creased exponentially: y = 0.1 * e - x / 23 - 76 . This feature 
shows that the short-term and long-term memory in this 
behavior are likely due to different mechanisms. For the 
part of k > 10, the decay curve reminds us of the Ebbing- 
haus forgetting curve which also has exponential nature. 
It is possible that the exponential decay of memory of 
our behavior lead to the same decay of the long-term 
memory. For the part of k < 10, it is much stronger 
and obviously has something important to do with the 
mountain- valley-structure above and even the heavy-tails 
in the distribution. 

To sum up, there are three important features in this 
behavior: the heavy-tails distribution of interevent time 
with an exponent j3 ~ 2; the dependence between the 
exponent and Activity; the strong memory. It is ob- 
vious that the stochastic models, such as the priority- 
queue model p, Jl6| and the cascading nonhomogeneous 
poisson process [fj, |7j, can't be the mechanism of this cor- 
related human dynamic. As to pervious memory-based 
models, the adaptive interest model|22| only can give a 
distribution with the exponent /3 = 1; the exponent from 



Vazquez's model can more than 2 [21[. However, two 



where r, is the interevent time values and n T is the num- 



other features is absent in the discussion of the both two 
models and the details, "overmany long intervals", also 
can't get an explanation from them. Below we will try to 
suppose a very simple model to explain all characteristics 
above. 



III. MODEL AND SIMULATION RESULTS 

Before building our model, let's try to gain some intu- 
ition from our daily life firstly. After along day's work, we 
have some free time and need to do something that can 
relax ourselves. There are always many choices for us: 
we can do exercises outside, or see a movie, or listen to 
music, or write some words in your website like blogging, 
and so on. In most cases, we would make a choice among 
them basing on our personal preferences which are very 
diverse and different from each other. However, in one 
way people are alike: the more someone get interested in 
it, the more frequently he would do. On the other hand, 
in a few cases, people would like to have a change and 
try to do something new or something that wasn't done 
for a long time. 

So we assume that there are N choices which can be 
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FIG. 5: (Color online)(a), (b)The interevent time of consecutive events of three users whose number of posts are 666, 390. The 
user in fig 3 is the same one in fig a; (c)The average of the memory coefficient of all qualified users with different K. 



regarded as different forms of entertainment. At each 
time step, the agent select one of them to do according 
to two "choosing rules" as follows: 

(l)Suppose one of N choices was selected T times in 
past M steps, then the probability of choosing this one 
at current time step is T/M. 

(2)Picking up randomly from these N choices 
The probability of executing the second rule is R and 
the one of executing the first rule is 1 — R. Here, the in- 
terevent times r is the number of steps between choosing 
the same one consecutively. From the description above, 
the probability of choosing i in current steps Pi is: 



Pi = (1 - R)Ti/M + R/N. 



(2) 



where T% is the times of selecting i in past M steps. 

In all simulations of this paper, N is fixed to 6 which 
means that we ignore new hobbies. It is obvious that 
the distribution of interevent times approaches an expo- 
nential one when R become rather lager. So R must be 
small but not be too small. The result of following the 
preference rule totally is that the agent will only select 
one of them repeatedly and ignore the others. In our 
simulations, we suppose R = 0.005. In M — > oo limit, 
the distribution of interevent times become also expo- 
nential(sceing fig 5b). It show this preference must be 
temporal, otherwise there would be no heavy-tails! And 
M, as the most important parameters in our model, show 
how temporal this preference is. The distribution of the 
interevent time with M = 1000, 20000, 160 was shown 
in fig 6. As we can see, the exponent of this distribu- 
tion when M = 160 is 2.19 which is in agreement with 
the previous empirical values, therefore M of simulations 
blew defaults to 160. 

It is noteworthy that the distribution in our case also 
has the similar "overmany long intervals" in fig 2b and 



fig 4. Actually the interevent time r which is lager than 
M is randomly generated by the second 'choosing rule'. 
So for the part of t > M it deviate the power-law (for 
fig 6a and 6c, see the part of r > 160). But according 
to our data analysis it seems that this extra long return 
times only can be found in loyal users who already have 
posted a certain number of articles and most of users 
would just abandon their blogs after suspending update 
for a long time(seeing fig 2). In our simulation, basing on 
the second 'choosing rule', we actually assumed that the 
agent is a " super loyal user" who would go on no matter 
how long he pause. That would be why the distribution 
given by our simulation have so many long intervals. 

In order to get enough samples, we ran our program 
100 times and 200,000 steps at each time. The selection 
made in last 60,000 steps was recorded as one individual 
did. So we got 100 series which belong to 100 different 
" users" . Then we calculated the Ai of each " user" and 
sorted them by it. We choose the top 60 "users" as one 
group whose (A) is 0.055 per step; the other group con- 
tain the remaining 40 "users" and the (A) of it is 0.295 
per step. From fig 6d, we can see similar dependence 
between the exponent and Activity: the exponent f3 in- 
crease from 2.10 to 2.41 as the Activity from 0.055 to 
0.295. 

In fig 7, the corresponding memory coefficient of our 
simulation decays as a power law which can fit well the 
short-term memory obtained from the data. For the long- 
term one or the part of K > 10, the coefficient of real 
data goes down exponentially with K and is about 0.01 
smaller than the one of our model. The cause of this 
discrepancy for long-term part may be that we do not 
take into consideration the memory fading. It embodies 
in two aspects: for the first choosing rule, the influence 
of past selection within M time steps is the same but 
in real behavior it should be decay with time; on the 
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FIG. 6: (Color online) The distribution of the interevent times given by our simulation. In our simulation , N = 6, R = 
0.005, (the same in fig 6). we only observed the selection of one of these N choices and the other's was ignored. At first, 
we make M choices randomly as our initial conditions. Our program ran 2,000,000 steps to get a stable exponent of the 
distribution. The result of simulations when M = 1000, M = 20000, and M = 160 is shown on (a), (b), (c). For (a), the 
exponent f3 = 2.92 and a = 4.0; the distribution in (b) is obviously close to exponential one; For (c), the exponent /3 = 2.19 
and a = 0.4. The distribution of the interevent time of the two groups with the same parameters of (c) is shown on (d). For 
group 1, (Ai) = 0.055, j3 = 2.10 and a = 1.5; for group 2, (A 2 ) = 0.295, /3 = 2.41 and a = 0.2. 



other hand, we actually assumed that people would never 
change his hobbies and be just trap in the N choices and 
in real life there is always a possibility of finding new 
hobby and abandon old one. However, in our opinion, 
the discrepancy is too small to affect the production of 
heavy-tails of this behaviors. The short-term memory, 
which is much stronger and play a key role in the origin 
of heavy-tails, is reproduced well by our model. 




FIG. 7: (Color online) The comparsion between the memory 
coefficient of the empirical result and simulation. Here, with 
the same parameters as fig 5c, we perform 1000 simulations 
which run 2000000 steps each time and get average M k . 



IV. DISCUSSION 

There is high complexity in human behaviors. Dif- 
ferent activities can be conducted in different behavior 
pattern and the same activity can be affect by multiple 
factors. To fully understand the pattern of human be- 
haviors we must investigate a wider variety of activities 
and further details, not just the exponent of interevent 
time distribution. In this paper, we investigated not only 
the heavy-tails in the activity pattern of blog-posting but 
also the memory and the role of Activity. In our opin- 
ion, it show there is another kind of activities which also 
have the similar heavy-tails nature but different origin. 
Although the influence of the seasonal cycles also would 
be found in this behavior, the short-term memory can't 
be explained by the model like the nonhomogeneous pois- 
son process [y,0]- 

One interesting result is that the decay curve of Mk 
have two regimes: for the short-term part(A' < 10), it 
decays as a power law; for the long-term part(A" > 10), 
it decreases exponentially. Based on the personal prefer- 
ences rule: "the more we do it recently, the more likely 
we will do it next" , numerical simulations give the strong 
short-term memory but the mechanism for long-term 
change is absent. That would be why Mk in the real 
date is a little smaller than the one of our simulation for 
the K > 10 part. Recent study also pays attention to 



this kind of long-term change in human activity [7j . At 
the level of individual, this change can be the result of 
personal interest and need shifts which seems very un- 
predictable. However, at the level of population, the ex- 
ponential decay of memory hint that it have something 
to do with the memory fading. 

Another feature reproduced by our model is the de- 
pendence between the Activity and the exponent of the 
interevent time distribution. According to the first rule of 
our model, the selective probability is actually positive- 
linear dependent on the temporal Activity Ti/M. That 
would be the cause of this dependence. Our model also 
imply that a symbiotic relationship exists between the 
strong short-term memory and this dependence. Further 
investigations are needs in this direction. 

Due to the complexity of our issue it is impractical to 
expect this simple model to accurately match with the 
empirical result. One important kinds of extensions of 
this model would be to consider the effect of the memory 
fading as we discussed above. It is still a question how to 
do it. One easy way would be to assume that the weight 
of influence of the past choices decay with time. How- 
ever, is there more nature ways? We mean to find the 
mechanism of memory and figure out why the strength 
of memory decay exponentially. One reason of forgetting 
the old hobby can be finding a new one. But the detail 
process of it is still unknown. Interaction is another fac- 
tor needed to be into consideration. Human, as a social 
beings, live in a network knitted by friends and relatives 
and cannot avoid the effect from it. But interaction seems 
be secondary to the heavy-tails as the interevent time dis- 
tribution of some systems without interaction also have 
the heavy-tails [24J and the model with interactions show 
it just increases the value of exponent |31[. If the stim- 
ulation from friends can make people more actively, our 
model actually include this effect naturally since the ex- 
ponent from our model also increases with the activity. 

In our opinion, this strong short-term memory corre- 
lation should be common in the activities which is more 
a matter of personality than task. We hope that more 
empirical studies would be made in the future. Actu- 
ally, people not only have preferences for different activ- 
ities, even for the same one with different types. Taking 



watching movies for example, when we decide to watch a 
DVD, there are usually many kinds of movies: romance, 
sci-fi, classics, horror.... in this case, the N choices of our 
model can be regarded as different types of movies. In 
the dynamic evolution of social network, we also can treat 
one's friends as the choices of our model when people try 
to choose one of their friends to contact. In our opin- 
ion, the personal preferences can be found both in many 
activities and at different levels. 

The memory or correlation of human behavior have 
much to do with the predictability. One's friends can un- 
derstand and predict his behaviors better because they 
know his past and one's past, present and future are in- 
terconnected. A music that many people like is proba- 
bly liked by you since we are influence each other and 
correlation could be found in the preferences of us. Re- 
vealing it in human behaviors and uncovering the mech- 
anism of correlated human dynamics have great signifi- 
cance for many fields, such as the link prediction and rec- 
ommender systems. Taking the recommender systems for 
example. All recommendation algorithms including the 
famous "page-rank" algorithms used by Google is just 
based on the empirical hypothesis. The practice shows 
it works, but why it works? And how to find the best 
recommendation algorithm? Without fully understand- 
ing human behavior, especially the relation between one's 
past behavior and present , we can not answer these ques- 
tions well. 
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